Trustable Symbolic Regression Models: Using Ensembles, Interval Arithmetic and Pareto Fronts to Develop Robust and Trust-aware Models
نویسندگان
چکیده
Trust is a major issue with deploying empirical models in the real world since changes in the underlying system or use of the model in new regions of parameter space can produce (potentially dangerous) incorrect predictions. The trepidation involved with model usage can be mitigated by assembling ensembles of diverse models and using their consensus as a trust metric, since these models will be constrained to agree in the data region used for model development and also constrained to disagree outside that region. The problem is to define an appropriate model complexity (since the ensemble should consist of models of similar complexity), as well as to identify diverse models from the candidate model set. In this chapter we discuss strategies for the development and selection of robust models and model ensembles and demonstrate those strategies against industrial data sets. An important benefit of this approach is that all available data may be used in the model development rather than a partition into training, test and validation subsets. The result is constituent models are more accurate without risk of over-fitting, the ensemble predictions are more accurate and the ensemble predictions have a meaningful trust metric.
منابع مشابه
Metamodeling by symbolic regression and Pareto simulated annealing
The subject of this paper is a new approach to symbolic regression. Other publications on symbolic regression use genetic programming. This paper describes an alternative method based on Pareto simulated annealing. Our method is based on linear regression for the estimation of constants. Interval arithmetic is applied to ensure the consistency of a model. To prevent overfitting, we merit a mode...
متن کاملInterval Arithmetic and Interval-Aware Operators for Genetic Programming
Symbolic regression via genetic programming is a exible approach to machine learning that does not require up-front specication of model structure. However, traditional approaches to symbolic regression require the use of protected operators, which can lead to perverse model characteristics and poor generalisation. In this paper, we revisit interval arithmetic as one possible solution to allo...
متن کاملMulti-objective Pareto optimization of bone drilling process using NSGA II algorithm
Bone drilling process is one the most common processes in orthopedic surgeries and bone breakages treatment. It is also very frequent in dentistry and bone sampling operations. Bone is a complex material and the machining process itself is sensitive so bone drilling is one of the most important, common and sensitive processes in Biomedical Engineering field. Orthopedic surgeries can be improved...
متن کاملGptips 2
GPTIPS is a free, open source MATLAB based software platform for symbolic data mining (SDM). It uses a ‘multigene’ variant of the biologically inspired machine learning method of genetic programming (MGGP) as the engine that drives the automatic model discovery process. Symbolic data mining is the process of extracting hidden, meaningful relationships from data in the form of symbolic equations...
متن کاملA confidence-aware interval-based trust model
It is a common and useful task in a web of trust to evaluate the trust value between two nodes using intermediate nodes. This technique is widely used when the source node has no experience of direct interaction with the target node, or the direct trust is not reliable enough by itself. If trust is used to support decision-making, it is important to have not only an accurate estimate of trust, ...
متن کامل